Search CORE

16 research outputs found

Reclaiming Fault Resilience and Energy Efficiency With Enhanced Performance in Low Power Architectures

Author: Gundi Noel Daniel
Publication venue: DigitalCommons@USU
Publication date: 01/08/2023
Field of study

Rapid developments of the AI domain has revolutionized the computing industry by the introduction of state-of-art AI architectures. This growth is also accompanied by a massive increase in the power consumption. Near-Theshold Computing (NTC) has emerged as a viable solution by offering significant savings in power consumption paving the way for an energy efficient design paradigm. However, these benefits are accompanied by a deterioration in performance due to the severe process variation and slower transistor switching at Near-Threshold operation. These problems severely restrict the usage of Near-Threshold operation in commercial applications. In this work, a novel AI architecture, Tensor Processing Unit, operating at NTC is thoroughly investigated to tackle the issues hindering system performance. Research problems are demonstrated in a scientific manner and unique opportunities are explored to propose novel design methodologies

DigitalCommons@USU

Implementation of 32 Bit Brent Kung Adder Using Complementary Pass Transistor Logic

Author: Gundi Noel Daniel
Publication venue: 'Oklahoma State University Library'
Publication date: 01/05/2015
Field of study

Adders are the most vital part of any digital system. Providing an efficient adder design which satisfies the tradeoff between speed and space aides in increasing the performance of the system. In the modern age in addition to the tradeoff between speed and space, power consumption plays a vital role. Devices with low power consumption and good performance are always preferred. Parallel Prefix adder are the ones widely used in Digital Design. This is primarily because of the flexibility in designing the Adders. Brent Kung Adder is a low power adder, as it uses minimum circuitry to obtain the result. The use of Complementary Pass transistor Logic aides in increasing the performance of the design by using the multiplexer approach in designing the various cells. The 16 bit design is extended to 32 bit, implemented in the physical level and successfully simulated. The area and delay results are accordingly illustrated.Electrical Engineerin

SHAREOK repository

Understanding Timing Error Characteristics From Overclocked Systolic Multiply–Accumulate Arrays in FPGAs

Author: Chakraborty Koushik
Chamberlin Andrew
Gerber Andrew
Goodale Tim
Gundi Noel Daniel
Palmer Mason
Roy Sanghamitra
Publication venue: Hosted by Utah State University Libraries
Publication date: 09/01/2024
Field of study

Artificial Intelligence (AI) hardware accelerators have seen tremendous developments in recent years due to the rapid growth of AI in multiple fields. Many such accelerators comprise a Systolic Multiply–Accumulate Array (SMA) as its computational brain. In this paper, we investigate the faulty output characterization of an SMA in a real silicon FPGA board. Experiments were run on a single Zybo Z7-20 board to control for process variation at nominal voltage and in small batches to control for temperature. The FPGA is rated up to 800 MHz in the data sheet due to the max frequency of the PLL, but the design is written using Verilog for the FPGA and C++ for the processor and synthesized with a chosen constraint of a 125 MHz clock. We then operate the system at a frequency range of 125 MHz to 450 MHz for the FPGA and the nominal 667 MHz for the processor core to produce timing errors in the FPGA without affecting the processor. Our extensive experimental platform with a hardware–software ecosystem provides a methodological pathway that reveals fascinating characteristics of SMA behavior under an overclocked environment. While one may intuitively expect that timing errors resulting from overclocked hardware may produce a wide variation in output values, our post-silicon evaluation reveals a lack of variation in erroneous output values. We found an intriguing pattern where error output values are stable for a given input across a range of operating frequencies far exceeding the rated frequency of the FPGA

DigitalCommons@USU

Challenges and Opportunities in Near-Threshold DNN Accelerators around Timing Errors

Author: Basu Prabal
Chakraborty Koushik
Gundi Noel Daniel
Pandey Pramesh
Patrick Mitchell Craig
Roy Sanghamitra
Shabanian Tahmoures
Publication venue: Hosted by Utah State University Libraries
Publication date: 16/10/2020
Field of study

AI evolution is accelerating and Deep Neural Network (DNN) inference accelerators are at the forefront of ad hoc architectures that are evolving to support the immense throughput required for AI computation. However, much more energy efficient design paradigms are inevitable to realize the complete potential of AI evolution and curtail energy consumption. The Near-Threshold Computing (NTC) design paradigm can serve as the best candidate for providing the required energy efficiency. However, NTC operation is plagued with ample performance and reliability concerns arising from the timing errors. In this paper, we dive deep into DNN architecture to uncover some unique challenges and opportunities for operation in the NTC paradigm. By performing rigorous simulations in TPU systolic array, we reveal the severity of timing errors and its impact on inference accuracy at NTC. We analyze various attributes—such as data–delay relationship, delay disparity within arithmetic units, utilization pattern, hardware homogeneity, workload characteristics—and uncover unique localized and global techniques to deal with the timing errors in NTC

DigitalCommons@USU

Implementing a Timing Error-Resilient and Energy-Efficient Near-Threshold Hardware Accelerator for Deep Neural Network Inference

Author: Koushik Chakraborty
Noel Daniel Gundi
Pramesh Pandey
Sanghamitra Roy
Publication venue: MDPI AG
Publication date: 01/06/2022
Field of study

Increasing processing requirements in the Artificial Intelligence (AI) realm has led to the emergence of domain-specific architectures for Deep Neural Network (DNN) applications. Tensor Processing Unit (TPU), a DNN accelerator by Google, has emerged as a front runner outclassing its contemporaries, CPUs and GPUs, in performance by 15×–30×. TPUs have been deployed in Google data centers to cater to the performance demands. However, a TPU’s performance enhancement is accompanied by a mammoth power consumption. In the pursuit of lowering the energy utilization, this paper proposes PREDITOR—a low-power TPU operating in the Near-Threshold Computing (NTC) realm. PREDITOR uses mathematical analysis to mitigate the undetectable timing errors by boosting the voltage of the selective multiplier-and-accumulator units at specific intervals to enhance the performance of the NTC TPU, thereby ensuring a high inference accuracy at low voltage. PREDITOR offers up to 3×–5× improved performance in comparison to the leading-edge error mitigation schemes with a minor loss in accuracy

Directory of Open Access Journals

Implementing a Timing Error-Resilient and Energy-Efficient Near-Threshold Hardware Accelerator for Deep Neural Network Inference

Author: Koushik Chakraborty
Noel Daniel Gundi
Pramesh Pandey
Sanghamitra Roy
Publication venue: 'MDPI AG'
Publication date: 06/06/2022
Field of study

Multidisciplinary Digital Publishing Institute

UPTPU: Improving Energy Efficiency of a Tensor Processing Unit through Underutilization Based Power-Gating

Author: Chakraborty Koushik
Gundi Noel Daniel
Pandey Pramesh
Roy Sanghamitra
Publication venue: Hosted by Utah State University Libraries
Publication date: 05/12/2021
Field of study

The AI boom is bringing a plethora of domain-specific architectures for Neural Network computations. Google\u27s Tensor Processing Unit (TPU), a Deep Neural Network (DNN) accelerator, has replaced the CPUs/GPUs in its data centers, claiming more than 15 × rate of inference. However, the unprecedented growth in DNN workloads with the widespread use of AI services projects an increasing energy consumption of TPU based data centers. In this work, we parametrize the extreme hardware underutilization in TPU systolic array and propose UPTPU: an intelligent, dataflow adaptive power-gating paradigm to provide a staggering 3.5 ×-6.5× energy efficiency to TPU for different input batch sizes

DigitalCommons@USU

EFFORT: A Comprehensive Technique to Tackle Timing Violations and Improve Energy Efficiency of Near-Threshold Tensor Processing Units

Author: Basu Prabal
Chakraborty Koushik
Gundi Noel Daniel
Pandey Pramesh
Roy Sanghamitra
Shabanian Tahmoures
Publication venue: Hosted by Utah State University Libraries
Publication date: 01/10/2021
Field of study

Modern deep neural network (DNN) applications demand a remarkable processing throughput usually unmet by traditional Von Neumann architectures. Consequently, hardware accelerators, comprising a sea of multiplier-and-accumulate (MAC) units, have recently gained prominence in accelerating DNN inference engine. For example, tensor processing units (TPUs) account for a lion\u27s share of Google\u27s datacenter inference operations. The proliferation of real-time DNN predictions is accompanied by a tremendous energy budget. In quest of trimming the energy footprint of DNN accelerators, we propose Energy eFFicient and errOr Resilient TPU (EFFORT) - an energy optimized, yet high-performance TPU architecture, operating at the near-threshold computing (NTC) region. EFFORT promotes a better-than-worst case design by operating the NTC TPU at a substantially high frequency while keeping the voltage at the NTC nominal value. In order to tackle the timing errors due to such aggressive operation, we employ an opportunistic error mitigation strategy. In addition, we implement an in situ clock gating architecture, drastically reducing the MACs\u27 dynamic power consumption. Compared to a cutting-edge error mitigation technique for TPUs, EFFORT enables up to

2.5\times

better performance at NTC with only 4% average accuracy drop across six out of eight DNN benchmarks

DigitalCommons@USU

EFFORT: Enhancing Energy Efficiency and Error Resilience of a Near-Threshold Tensor Processing Unit

Author: Basu Prabal
Chakraborty Koushik
Gundi Noel Daniel
Pandey Pramesh
Roy Sanghamitra
Shabanian Tahmoures
Zhang Zhen
Publication venue: Hosted by Utah State University Libraries
Publication date: 13/01/2020
Field of study

Modern deep neural network (DNN) applications demand a remarkable processing throughput usually unmet by traditional Von Neumann architectures. Consequently, hardware accelerators, comprising a sea of multiplier and accumulate (MAC) units, have recently gained prominence in accelerating DNN inference engine. For example, Tensor Processing Units (TPU) account for a lion\u27s share of Google\u27s datacenter inference operations. The proliferation of real-time DNN predictions is accompanied with a tremendous energy budget. In quest of trimming the energy footprint of DNN accelerators, we propose EFFORT-an energy optimized, yet high performance TPU architecture, operating at the Near-Threshold Computing (NTC) region. EFFORT promotes a better-than-worst-case design by operating the NTC TPU at a substantially high frequency while keeping the voltage at the NTC nominal value. In order to tackle the timing errors due to such aggressive operation, we employ an opportunistic error mitigation strategy. Additionally, we implement an in-situ clock gating architecture, drastically reducing the MACs\u27 dynamic power consumption. Compared to a cutting-edge error mitigation technique for TPUs, EFFORT enables up to 2.5× better performance at NTC with only 2% average accuracy drop across 3 out of 4 DNN datasets

DigitalCommons@USU

Challenges and Opportunities in Near-Threshold DNN Accelerators around Timing Errors

Author: Koushik Chakraborty
Mitchell Craig Patrick
Noel Daniel Gundi
Prabal Basu
Pramesh Pandey
Sanghamitra Roy
Tahmoures Shabanian
Publication venue: 'MDPI AG'
Publication date: 16/10/2020
Field of study

Multidisciplinary Digital Publishing Institute